Font and Function Word Identification in Document Recognition
نویسندگان
چکیده
font would be used during recognition. This would reduce An algorithm is presented that identifies the predominant font in which the running text in an English language document the confusion caused by training on many fonts and would is printed. Frequent function words (such as the, of, and, a, effectively reduce the recognition problem to choosing the and to) are also recognized as part of the font identification. correct class from one font rather than from many fonts. Clusters of word images are generated from an input document One method for identifying the font in which a document and matched to a database of function words derived from is printed is to match the individual character images to fonts and document images. The font or document that matches the character images in a font database. The font of the best provides the identification of the predominant font and identified characters provides the desired information. A function words. This technique takes advantage of the fact that disadvantage of such a technique is that it only uses visual most machine-printed documents are prepared with a single information from isolated characters and is thus sensitive predominant font. Also, the repeated words in the document to noise that is commonly present in photocopies and facare utilized to overcome noise in the input. Advantages of this similes. technique include its use as a preprocessing step for a document recognition algorithm. Experimental results show high accuThe font identification algorithm presented in this paper racy is achieved on a database of original and degraded docudetects the predominant font in a given document, that is, ment images. 1996 Academic Press, Inc. the single font that is used in most documents to print
منابع مشابه
Optical Font Recognition from Projection Profiles
• Recognition of logical document structures [1], where knowledge of the font used in a word, line, or text block may be useful for defining its logical label (chapter title, section title or paragraph). • Document reproduction, where knowledge of the font is necessary in order to reproduce (reprint) the document. • Document indexing and information retrieval, where word indexes are generally p...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملA study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution
In this paper, we propose a new font and size identification method for ultra-low resolution Arabic word images using a stochastic approach. The literature has proved the difficulty for Arabic text recognition systems to treat multi-font and multi-size word images. This is due to the variability induced by some font family, in addition to the inherent difficulties of Arabic writing including cu...
متن کاملUsing Typography in Document Image Analysis
Even if font usage plays an important role in Document Image Analysis (DIA), recognition systems generally take the concept of font management in a weaker sense than in the production cycle. With the point of view of the document recognition community, we show how typographic information (characters bitmap, metrics, etc.) can improve existing analysis methods. After a brief survey of font recog...
متن کاملFont Identification in Historical Documents Using Active Learning
Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an activelearning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Vision and Image Understanding
دوره 63 شماره
صفحات -
تاریخ انتشار 1996